Picture for Xiaoshuai Sun

Xiaoshuai Sun

Plan Before Search: Search Agents Need Plan

Add code
May 27, 2026
Viaarxiv icon

Look on Demand: A Cognitive Scheduling Framework for Visual Evidence Acquisition in Multimodal Reasoning

Add code
May 27, 2026
Viaarxiv icon

Evading Visual Aphasia: Contrastive Adaptive Semantic Token Pruning for Vision-Language Models

Add code
May 10, 2026
Viaarxiv icon

Scaling the Long Video Understanding of Multimodal Large Language Models via Visual Memory Mechanism

Add code
Mar 31, 2026
Viaarxiv icon

Persistent Story World Simulation with Continuous Character Customization

Add code
Mar 17, 2026
Viaarxiv icon

Test-Time Computing for Referring Multimodal Large Language Models

Add code
Feb 23, 2026
Viaarxiv icon

MICON-Bench: Benchmarking and Enhancing Multi-Image Context Image Generation in Unified Multimodal Models

Add code
Feb 23, 2026
Viaarxiv icon

CSMCIR: CoT-Enhanced Symmetric Alignment with Memory Bank for Composed Image Retrieval

Add code
Jan 07, 2026
Viaarxiv icon

CIR-CoT: Towards Interpretable Composed Image Retrieval via End-to-End Chain-of-Thought Reasoning

Add code
Oct 09, 2025
Figure 1 for CIR-CoT: Towards Interpretable Composed Image Retrieval via End-to-End Chain-of-Thought Reasoning
Figure 2 for CIR-CoT: Towards Interpretable Composed Image Retrieval via End-to-End Chain-of-Thought Reasoning
Figure 3 for CIR-CoT: Towards Interpretable Composed Image Retrieval via End-to-End Chain-of-Thought Reasoning
Figure 4 for CIR-CoT: Towards Interpretable Composed Image Retrieval via End-to-End Chain-of-Thought Reasoning
Viaarxiv icon

MIHBench: Benchmarking and Mitigating Multi-Image Hallucinations in Multimodal Large Language Models

Add code
Aug 01, 2025
Viaarxiv icon